A Comparison of Pooled and Sampled Relevance Judgments in the TREC 2006 Terabyte Track
نویسنده
چکیده
Pooling is the most common technique used to build modern test collections. Evidence is mounting that pooling may not yield reusable test collections for very large document sets. This paper describes the approach taken in the TREC 2006 Terabyte Track: an initial shallow pool was judged to gather relevance information, which was then used to draw a random sample of further documents to judge. The sample judgments rank systems somewhat differently than the pool. Some analysis and plans for further research are discussed.
منابع مشابه
TREC 2006 Legal Track Overview
This paper describes the first year of a new TREC track focused on “e-discovery” of business records and other materials. A large collection of scanned documents produced by multiple real world discovery requests was adopted as the basis for the test collection. Topic statements were developed using a process representative of current practice in e-discovery applications, with both Boolean and ...
متن کاملThe Hedge Algorithm for Metasearch at TREC 2006
Aslam, Pavlu, and Savell [3] introduced the Hedge algorithm for metasearch which effectively combines the ranked lists of documents returned by multiple retrieval systems in response to a given query and learns which documents are likely to be relevant from a sequence of on-line relevance judgments. It has been demonstrated that the Hedge algorithm is an effective technique for metasearch, ofte...
متن کاملA Practical Sampling Strategy for Efficient Retrieval Evaluation
We consider the problem of large-scale retrieval evaluation, with a focus on the considerable effort required to judge tens of thousands of documents using traditional test collection construction methodologies. Recently, two methods based on random sampling were proposed to help alleviate this burden: While the first method proposed by Aslam et al. is very accurate and efficient, it is also ve...
متن کاملIO-Top-k at TREC 2006: Terabyte Track
This paper describes the setup and results of our contribution to the TREC 2006 Terabyte Track. Our implementation was based on the algorithms proposed in [1] “IOTop-k: Index-Access Optimized Top-K Query Processing, VLDB’06”, with a main focus on the efficiency track.
متن کاملThe University of Amsterdam at the TREC 2006 Terabyte Track
As part of the TREC 2006 Terabyte track, we conducted a range of experiments investigating the effects of larger test collections for both adhoc and known-item topics. In this paper, we document our official submissions to the TREC 2006 Terabyte track and conduct a number of more extensive experiments. First, we look at the amount of smoothing required for largescale collections. Second, we inv...
متن کامل